Sampling zoo integration by artemlunev2000 · Pull Request #1423 · aimclub/FEDOT

artemlunev2000 · 2026-03-18T10:09:02Z

Summary

This PR continues the first stage of Sampling Zoo integration. Chunking and subset strategies are now explicitly separated by strategy_kind. Subset runs follow the standard single‑dataset training path with sample selection, while chunking produces multiple InputData partitions and trains a PipelineEnsemble over them. The ensemble replaces the current pipeline for predict, and preserves existing API behavior where possible.

Context

github-actions · 2026-03-18T10:10:27Z

Code in this pull request contains PEP8 errors, please write the /fix-pep8 command in the comments below to create commit with automatic fixes.

Lopa10ko · 2026-03-18T10:41:43Z

fedot/api/sampling_stage/providers.py

+        if strategy_kind == 'chunking':
+            return self._sample_chunking(
+                factory=factory,
+                features=features,
+                target=target,
+                strategy=strategy,
+                strategy_params=strategy_params,
+                random_state=random_state
+            )
+        elif strategy_kind == 'subset':
+            return self._sample_subset(
+                factory=factory,
+                features=features,
+                target=target,
+                strategy=strategy,
+                strategy_params=strategy_params,
+                random_state=random_state,
+                injectable_params=injectable_params
+            )
+        else:
+            raise ValueError(f'Unsupported sampling strategy kind: {strategy_kind}')


все подобные ветвления переписать на проверки на вхождение в перечислимый тип или маппинг (словарь) для поддержки расширяемости

strategy_kind in available_strategies
или в данном случае
return available_sample_methods[strategy_kind]

позже неудобно будет добавлять новые стратегии сэмплирования при таком подходе, который сейчас в реализации

Lopa10ko · 2026-03-18T10:43:12Z

fedot/api/sampling_stage/providers.py

+    def _sample_subset(self,
+                       factory: Any,
+                       features: np.ndarray,
+                       target: np.ndarray,
+                       strategy: str,
+                       strategy_params: Dict[str, Any],
+                       random_state: Optional[int],
+                       injectable_params: Optional[Dict[str, Any]]) -> SamplingProviderResult:
+        n_rows = int(features.shape[0])


выносить в pure функции вне SamplingProvider

Lopa10ko · 2026-03-18T10:44:07Z

fedot/api/sampling_stage/providers.py

decouple providers

Lopa10ko · 2026-03-18T10:44:51Z

fedot/api/sampling_stage/executor.py

+    def _execute_chunking(self,
+                          train_data: InputData,
+                          started_at: float,
+                          budget_seconds: float) -> SamplingStageOutput:
+        self._raise_if_budget_exceeded(started_at, budget_seconds)
+        remaining_budget = self._remaining_budget(started_at, budget_seconds)


_execute_* методы вынести в pure фунции, decouple executors

Lopa10ko · 2026-03-18T10:47:20Z

fedot/api/sampling_stage/executor.py

+        return np.asarray(target)[indices]
+
+    @staticmethod
+    def _partitions_to_input_data_list(partitions: Dict[str, Any],


очень нагруженный метод _partitions_to_input_data_list, посмотреть наработки @Romankkl03 по TensorData - изучить новый протокол потока данных и адаптировать работу в этом PR

Lopa10ko · 2026-03-18T10:51:40Z

fedot/api/sampling_stage/providers.py

    _SAMPLING_MODULE_CANDIDATES = (
        'sampling_zoo.core.api.api_main',
        'sampling_zoo.api.api_main',
        'core.api.api_main',


сразу же отказаться от внутренней зависимости

artemlunev2000 added 3 commits March 13, 2026 16:54

feat: divide chunking and subset sampling strategies

43b5f44

test: add all sampling zoo strategies tests

66be6fa

refactor: improve and simplify sampling integration

b7d6f2f

Lopa10ko requested changes Mar 18, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sampling zoo integration#1423

Sampling zoo integration#1423
artemlunev2000 wants to merge 3 commits intocodex/arch_refactoringfrom
sampling_zoo_integration

artemlunev2000 commented Mar 18, 2026

Uh oh!

github-actions bot commented Mar 18, 2026

Uh oh!

Lopa10ko Mar 18, 2026

Uh oh!

Lopa10ko Mar 18, 2026

Uh oh!

Lopa10ko Mar 18, 2026

Uh oh!

Lopa10ko Mar 18, 2026

Uh oh!

Lopa10ko Mar 18, 2026

Uh oh!

Lopa10ko Mar 18, 2026

Uh oh!

Lopa10ko Mar 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

artemlunev2000 commented Mar 18, 2026

Summary

Context

Uh oh!

github-actions bot commented Mar 18, 2026

Uh oh!

Lopa10ko Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

Lopa10ko Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

Lopa10ko Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

Lopa10ko Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

Lopa10ko Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

Lopa10ko Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

Lopa10ko Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants